information granule
Generation of Granular-Balls for Clustering Based on the Principle of Justifiable Granularity
Jia, Zihang, Zhang, Zhen, Pedrycz, Witold
Efficient and robust data clustering remains a challenging task in the field of data analysis. Recent efforts have explored the integration of granular-ball (GB) computing with clustering algorithms to address this challenge, yielding promising results. However, existing methods for generating GBs often rely on single indicators to measure GB quality and employ threshold-based or greedy strategies, potentially leading to GBs that do not accurately capture the underlying data distribution. To address these limitations, this article introduces a novel GB generation method. The originality of this method lies in leveraging the principle of justifiable granularity to measure the quality of a GB for clustering tasks. To be precise, we define the coverage and specificity of a GB and introduce a comprehensive measure for assessing GB quality. Utilizing this quality measure, the method incorporates a binary tree pruning-based strategy and an anomaly detection method to determine the best combination of sub-GBs for each GB and identify abnormal GBs, respectively. Compared to previous GB generation methods, the new method maximizes the overall quality of generated GBs while ensuring alignment with the data distribution, thereby enhancing the rationality of the generated GBs. Experimental results obtained from both synthetic and publicly available datasets underscore the effectiveness of the proposed GB generation method, showcasing improvements in clustering accuracy and normalized mutual information.
A Novel Granular-Based Bi-Clustering Method of Deep Mining the Co-Expressed Genes
Xu, Kaijie, Pedrycz, Witold, Li, Zhiwu, Quan, Yinghui, Nie, Weike
Traditional clustering methods are limited when dealing with huge and heterogeneous groups of gene expression data, which motivates the development of bi-clustering methods. Bi-clustering methods are used to mine bi-clusters whose subsets of samples (genes) are co-regulated under their test conditions. Studies show that mining bi-clusters of consistent trends and trends with similar degrees of fluctuations from the gene expression data is essential in bioinformatics research. Unfortunately, traditional bi-clustering methods are not fully effective in discovering such bi-clusters. Therefore, we propose a novel bi-clustering method by involving here the theory of Granular Computing. In the proposed scheme, the gene data matrix, considered as a group of time series, is transformed into a series of ordered information granules. With the information granules we build a characteristic matrix of the gene data to capture the fluctuation trend of the expression value between consecutive conditions to mine the ideal bi-clusters. The experimental results are in agreement with the theoretical analysis, and show the excellent performance of the proposed method.
Augmentation of the Reconstruction Performance of Fuzzy C-Means with an Optimized Fuzzification Factor Vector
Xu, Kaijie, Pedrycz, Witold, Li, Zhiwu
Information granules have been considered to be the fundamental constructs of Granular Computing (GrC). As a useful unsupervised learning technique, Fuzzy C-Means (FCM) is one of the most frequently used methods to construct information granules. The FCM-based granulation-degranulation mechanism plays a pivotal role in GrC. In this paper, to enhance the quality of the degranulation (reconstruction) process, we augment the FCM-based degranulation mechanism by introducing a vector of fuzzification factors (fuzzification factor vector) and setting up an adjustment mechanism to modify the prototypes and the partition matrix. The design is regarded as an optimization problem, which is guided by a reconstruction criterion. In the proposed scheme, the initial partition matrix and prototypes are generated by the FCM. Then a fuzzification factor vector is introduced to form an appropriate fuzzification factor for each cluster to build up an adjustment scheme of modifying the prototypes and the partition matrix. With the supervised learning mode of the granulation-degranulation process, we construct a composite objective function of the fuzzification factor vector, the prototypes and the partition matrix. Subsequently, the particle swarm optimization (PSO) is employed to optimize the fuzzification factor vector to refine the prototypes and develop the optimal partition matrix. Finally, the reconstruction performance of the FCM algorithm is enhanced. We offer a thorough analysis of the developed scheme. In particular, we show that the classical FCM algorithm forms a special case of the proposed scheme. Experiments completed for both synthetic and publicly available datasets show that the proposed approach outperforms the generic data reconstruction approach.
Unsupervised Fuzzy eIX: Evolving Internal-eXternal Fuzzy Clustering
Aguiar, Charles, Leite, Daniel
Time-varying classifiers, namely, evolving classifiers, play an important role in a scenario in which information is available as a never-ending online data stream. We present a new unsupervised learning method for numerical data called evolving Internal-eXternal Fuzzy clustering method (Fuzzy eIX). We develop the notion of double-boundary fuzzy granules and elaborate on its implications. Type 1 and type 2 fuzzy inference systems can be obtained from the projection of Fuzzy eIX granules. We perform the principle of the balanced information granularity within Fuzzy eIX classifiers to achieve a higher level of model understandability. Internal and external granules are updated from a numerical data stream at the same time that the global granular structure of the classifier is autonomously evolved. A synthetic nonstationary problem called Rotation of Twin Gaussians shows the behavior of the classifier. The Fuzzy eIX classifier could keep up with its accuracy in a scenario in which offline-trained classifiers would clearly have their accuracy drastically dropped.
An Effective Multi-Resolution Hierarchical Granular Representation based Classifier using General Fuzzy Min-Max Neural Network
Khuat, Thanh Tung, Chen, Fang, Gabrys, Bogdan
Motivated by the practical demands for simplification of data towards being consistent with human thinking and problem solving as well as tolerance of uncertainty, information granules are becoming important entities in data processing at different levels of data abstraction. This paper proposes a method to construct classifiers from multi-resolution hierarchical granular representations (MRHGRC) using hyperbox fuzzy sets. The proposed approach forms a series of granular inferences hierarchically through many levels of abstraction. An attractive characteristic of our classifier is that it can maintain relatively high accuracy at a low degree of granularity based on reusing the knowledge learned from lower levels of abstraction. In addition, our approach can reduce the data size significantly as well as handling the uncertainty and incompleteness associated with data in real-world applications. The construction process of the classifier consists of two phases. The first phase is to formulate the model at the greatest level of granularity, while the later stage aims to reduce the complexity of the constructed model and deduce it from data at higher abstraction levels. Experimental outcomes conducted comprehensively on both synthetic and real datasets indicated the efficiency of our method in terms of training time and predictive performance in comparison to other types of fuzzy min-max neural networks and common machine learning algorithms.
Non-iterative Label Propagation in Optimal Leading Forest
Graph based semi-supervised learning (GSSL) has intuitive representation and can be improved by exploiting the matrix calculation. However, it has to perform iterative optimization to achieve a preset objective, which usually leads to low efficiency. Another inconvenience lying in GSSL is that when new data come, the graph construction and the optimization have to be conducted all over again. We propose a sound assumption, arguing that: the neighboring data points are not in peer-to-peer relation, but in a partial-ordered relation induced by the local density and distance between the data; and the label of a center can be regarded as the contribution of its followers. Starting from the assumption, we develop a highly efficient non-iterative label propagation algorithm based on a novel data structure named as optimal leading forest (LaPOLeaF). The major weaknesses of the traditional GSSL are addressed by this study. We further scale LaPOLeaF to accommodate big data by utilizing block distance matrix technique, parallel computing, and Locality-Sensitive Hashing (LSH). Experiments on large datasets have shown the promising results of the proposed methods.
Aggregation of Classifiers: A Justifiable Information Granularity Approach
Nguyen, Tien Thanh, Pham, Xuan Cuong, Liew, Alan Wee-Chung, Pedrycz, Witold
In this study, we introduce a new approach to combine multi-classifiers in an ensemble system. Instead of using numeric membership values encountered in fixed combining rules, we construct interval membership values associated with each class prediction at the level of meta-data of observation by using concepts of information granules. In the proposed method, uncertainty (diversity) of findings produced by the base classifiers is quantified by interval-based information granules. The discriminative decision model is generated by considering both the bounds and the length of the obtained intervals. We select ten and then fifteen learning algorithms to build a heterogeneous ensemble system and then conducted the experiment on a number of UCI datasets. The experimental results demonstrate that the proposed approach performs better than the benchmark algorithms including six fixed combining methods, one trainable combining method, AdaBoost, Bagging, and Random Subspace.
An improved approach to attribute reduction with covering rough sets
Wang, Changzhong, Sun, Baiqing, Hu, Qinhua
Attribute reduction is viewed as an important preprocessing step for pattern recognition and data mining. Most of researches are focused on attribute reduction by using rough sets. Recently, Tsang et al. discussed attribute reduction with covering rough sets in the paper [E. C.C. Tsang, D. Chen, Daniel S. Yeung, Approximations and reducts with covering generalized rough sets, Computers and Mathematics with Applications 56 (2008) 279-289], where an approach based on discernibility matrix was presented to compute all attribute reducts. In this paper, we provide an improved approach by constructing simpler discernibility matrix with covering rough sets, and then proceed to improve some characterizations of attribute reduction provided by Tsang et al. It is proved that the improved discernible matrix is equivalent to the old one, but the computational complexity of discernible matrix is greatly reduced.